9 research outputs found
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning
We investigate whether Deep Reinforcement Learning (Deep RL) is able to
synthesize sophisticated and safe movement skills for a low-cost, miniature
humanoid robot that can be composed into complex behavioral strategies in
dynamic environments. We used Deep RL to train a humanoid robot with 20
actuated joints to play a simplified one-versus-one (1v1) soccer game. We first
trained individual skills in isolation and then composed those skills
end-to-end in a self-play setting. The resulting policy exhibits robust and
dynamic movement skills such as rapid fall recovery, walking, turning, kicking
and more; and transitions between them in a smooth, stable, and efficient
manner - well beyond what is intuitively expected from the robot. The agents
also developed a basic strategic understanding of the game, and learned, for
instance, to anticipate ball movements and to block opponent shots. The full
range of behaviors emerged from a small set of simple rewards. Our agents were
trained in simulation and transferred to real robots zero-shot. We found that a
combination of sufficiently high-frequency control, targeted dynamics
randomization, and perturbations during training in simulation enabled
good-quality transfer, despite significant unmodeled effects and variations
across robot instances. Although the robots are inherently fragile, minor
hardware modifications together with basic regularization of the behavior
during training led the robots to learn safe and effective movements while
still performing in a dynamic and agile way. Indeed, even though the agents
were optimized for scoring, in experiments they walked 156% faster, took 63%
less time to get up, and kicked 24% faster than a scripted baseline, while
efficiently combining the skills to achieve the longer term objectives.
Examples of the emergent behaviors and full 1v1 matches are available on the
supplementary website.Comment: Project website: https://sites.google.com/view/op3-socce
Machine Learning for the Zwicky Transient Facility
The Zwicky Transient Facility is a large optical survey in multiple filters producing hundreds of thousands of transient alerts per night. We describe here various machine learning (ML) implementations and plans to make the maximal use of the large data set by taking advantage of the temporal nature of the data, and further combining it with other data sets. We start with the initial steps of separating bogus candidates from real ones, separating stars and galaxies, and go on to the classification of real objects into various classes. Besides the usual methods (e.g., based on features extracted from light curves) we also describe early plans for alternate methods including the use of domain adaptation, and deep learning. In a similar fashion we describe efforts to detect fast moving asteroids. We also describe the use of the Zooniverse platform for helping with classifications through the creation of training samples, and active learning. Finally we mention the synergistic aspects of ZTF and LSST from the ML perspective
The United States COVID-19 Forecast Hub dataset
Academic researchers, government agencies, industry groups, and individuals have produced forecasts at an unprecedented scale during the COVID-19 pandemic. To leverage these forecasts, the United States Centers for Disease Control and Prevention (CDC) partnered with an academic research lab at the University of Massachusetts Amherst to create the US COVID-19 Forecast Hub. Launched in April 2020, the Forecast Hub is a dataset with point and probabilistic forecasts of incident cases, incident hospitalizations, incident deaths, and cumulative deaths due to COVID-19 at county, state, and national, levels in the United States. Included forecasts represent a variety of modeling approaches, data sources, and assumptions regarding the spread of COVID-19. The goal of this dataset is to establish a standardized and comparable set of short-term forecasts from modeling teams. These data can be used to develop ensemble models, communicate forecasts to the public, create visualizations, compare models, and inform policies regarding COVID-19 mitigation. These open-source data are available via download from GitHub, through an online API, and through R packages
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Despite their wide adoption, the underlying training and memorization
dynamics of very large language models is not well understood. We empirically
study exact memorization in causal and masked language modeling, across model
sizes and throughout the training process. We measure the effects of dataset
size, learning rate, and model size on memorization, finding that larger
language models memorize training data faster across all settings.
Surprisingly, we show that larger models can memorize a larger portion of the
data before over-fitting and tend to forget less throughout the training
process. We also analyze the memorization dynamics of different parts of speech
and find that models memorize nouns and numbers first; we hypothesize and
provide empirical evidence that nouns and numbers act as a unique identifier
for memorizing individual training examples. Together, these findings present
another piece of the broader puzzle of trying to understand what actually
improves as models get bigger
Investigating Generalization by Controlling Normalized Margin
Weight norm and margin participate in learning theory via
the normalized margin . Since standard neural net optimizers do
not control normalized margin, it is hard to test whether this quantity
causally relates to generalization. This paper designs a series of experimental
studies that explicitly control normalized margin and thereby tackle two
central questions. First: does normalized margin always have a causal effect on
generalization? The paper finds that no -- networks can be produced where
normalized margin has seemingly no relationship with generalization, counter to
the theory of Bartlett et al. (2017). Second: does normalized margin ever have
a causal effect on generalization? The paper finds that yes -- in a standard
training setup, test performance closely tracks normalized margin. The paper
suggests a Gaussian process model as a promising explanation for this behavior
DeepStreaks: identifying fast-moving objects in the Zwicky Transient Facility data with deep learning
International audienceWe present DeepStreaks, a convolutional-neural-network, deep-learning system designed to efficiently identify streaking fast-moving near-Earth objects that are detected in the data of the Zwicky Transient Facility (ZTF), a wide-field, time-domain survey using a dedicated 47 deg^2 camera attached to the Samuel Oschin 48-inch Telescope at the Palomar Observatory in California, United States. The system demonstrates a 96–98 per cent true positive rate, depending on the night, while keeping the false positive rate below 1 per cent. The sensitivity of DeepStreaks is quantified by the performance on the test data sets as well as using known near-Earth objects observed by ZTF. The system is deployed and adapted for usage within the ZTF Solar system framework and has significantly reduced human involvement in the streak identification process, from several hours to typically under 10 min per day